Add adjoint differentiation of tfq.math.inner_product()#477
Conversation
There was a problem hiding this comment.
Hi Jae, this is amazing work!! Nice job adapting the adjoint method code to work on inner product.
I think there are some tweaks that need to be made to the code before we can merge:
-
The forward pass output shape is [batch, n_others], much like our expectation op which has forward pass: [batch, n_ops]. The gradient for our expectation op is the total gradient (i.e
$sum_{op_i} [batch, n_ops, n_symbols] => [batch, n_symbols]$ ). In this case you aren't giving the total gradient summing over$other_i$ which would allow you to simplify the computation in the C++ op (you could have something likeAccumulateOperators) that let's you accumulate all other_programs together so you can compute the total gradient for all states in one pass instead of one by one. -
I know the way TensorFlow handles complex gradients can be a little complicated so I was wondering if you think it might make sense to setup a few tests with a small TF compute graph, using raw tf operations (no TFQ) to implement a small circuit inner product calculation and then compare the gradients that come out of that compute graph with the ones we produce from our op.
-
I think we may be able to remove a lot of the boilerplate function code for the gradient by just using the
@RegisterGradientdecorator.
MichaelBroughton
left a comment
There was a problem hiding this comment.
Things are looking pretty good now! A few more small tweaks and we should be ready to go. It will be exciting to start putting circuit inner product calculations inside of compute graphs and Keras models!
|
PTAL. |
updating cards on qsim documentation landing page
This PR adds adjoint differentiation of tfq.math_ops.inner_prod().
Since inner_product() has the specific output tensor shape
[batch_size, inner_size, n_symbols], this PR decided to implement new adjoint gradient op for it. (The original adjoint gradient has only[batch_size, n_symbols]output size, so one more internal nested for-loop is added in this new adjoint gradient op)For edge cases, this PR deals with them like:
[1] empty symbols
The following code shows the default behavior of TensorFlow dealing with gradients with respect to the empty symbols.
So, this PR adds this behavior in Python
tf.custom_gradient()function return function -def grad(dy). For C++inner_product_adj_grad_opitself, this PR adds throwing errors if the symbol is empty.[2] empty circuits
I suspect that our current definition is controversial. for example, we can say that the given empty circuit is just
|0>. Then, we can say that the output inner product is1.0for both empty circuits<0|0>, and its gradient is0.0. However, if only theother_programsis empty, we may have<psi(x)|0>and<dpsi(x)/dx|0>. That's why this PR didn't just return the default value when the circuit is empty.